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ON  SEGMENTATION  OF  DIGITAL  IMAGES 
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STANLEY  L.  SCLOVE 
Department  of  Quantitative  Methods 
College  of  Business  Adainistration 
University  of  Illinois  at  Chicago 

ABSTRACT 

The  problea  of  partitioning  a  digital  iaage  into  segaents  is 
considered.  First  the  procedure  is  illustrated  for  the  analogous 
one-diaensional  problea,  naaely,  segaentation  of  tiae  series.  Then 
siailar  ideas  are  applied  to  the  segaentation  of  digital  iaages. 

The  segaents  are  considered  as  falling  into  classes.  A  probability 
distribution  is  associated  with  each  class  of  segaent.  Paraaetrie 
faailies  of  distributions  are  considered,  a  set  of  paraaeter  values 
being  associated  with  each  class.  With  each  observation  is  associated 
an  unobservable  label,  indicating  froa  which  class  the  observation 
arose.  The  label  process  is  aodeled  as  a  Markov  process.  Segaentation 
algorithsw  are  obtained  by  applying  a  aethod  of  iterated  aaxiaua 
likelihood  (a  relaxation  aethod)  to  the  resulting  likelihood  function. 

In  this  paper  special  attention  is  given  to  the  situation  in  which  the 
observations  are  conditionally  independent,  given  the  labels.  Numerical 
examples  are  given.  Choice  of  the  number  of  classes,  using 
statistical  model-selection  criteria,  is  illustrated.  - - — — - 

Key  words  and  phrases:  time-series  segmentation,  digital  image 
segaentation,  relaxation  aethod,  Viterbi  algorithm,  model-selection 
criteria,  information  criteria,  Akaike's  information  criterion. 
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1.  Introduction 

Problems  of  segmenting  time  series  and  digital  images  are  considered 
In  both  cases  the  observations  fall  into  classes,  and  this  assignment 
of  observations  to  classes,  or  "labeling,"  is  unknown.  Thus  each  point 
gives  rise  to  a  pair,  the  observation  itself,  together  with  its  unknown 
label.  In  the  context  of  this  model,  segmentation  is  estimation  of  the 
labels.  In  time  series  the  points  are  time  points;  in  digital  images, 
the  points  are  points  of  the  image  (picture- elements ,  or  pixels).  An 
image  is  two-dimensional,  while  time  is  one-dimensional,  so  time  series 
are  treated  first. 

2.  Segmentation  of  Time  Series 

The  problem  of  segmentation  considered  here  is:  Given  a  time  series 

{ x  ^ ,  t“l ,2, . . . , n} , 

partition  the  set  of  values  of  t  into  sub-series  (segments,  regimes) 
in  some  meaningful  way.  The  segments  are  assumed  to  fall  into 
several  classes.  The  classes  may  be  associated  with  different 
generating  mechanisms.  In  cyclic  processes  the  classes  are  phases 
of  the  cycle. 
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Examples .  (i)  Segment  a  received  signal  into  background,  target, 

background  again,  another  target,  etc.  (ii)  Segment  an  EEG  of  a 
sleeping  person  into  periods  of  deep  sleep  and  restless  or  fitful  sleep 
(two  classes  of  segment),  (iii)  Segment  an  ECG  into  rhythmic  and 
arhythmie  periods  (two  classes  of  segment),  (iv)  Segment  an  economic 
time  series  into  periods  of  recession,  recovery,  and  expansion.  Here 
there  are  three  classes  of  segment. 

In  some  applications  the  observation  X  is  a  vector  of  several  measure¬ 
ments.  E.g.,  for  blood  pressure,  X  is  a  vector  of  the  two  measurements , 
systolic  and  diastolic.  The  discussion  of  segmentation  of  digital  images, 
later  in  the  paper,  will  be  in  terms  of  vector  measurements.  Time  series 
will  be  discussed  in  terms  of  scalar  (single)  measurements,  although 
the  ideas  and  methods  readily  generalize  to  multiple  time  series. 

2.1.  The  Model 

One  can  imagine  a  series  which  is  usually  relatively  smooth  but  occa¬ 
sionally  rather  jumpy  as  being  composed  of  sub-series  which  are  first- 
order  autoregressive,  the  autocorrelation  coefficient  being  positive  for 
the  smooth  segments  and  negative  for  the  jumpy  ones.  One  might  try 
fitting  such  data  with  a  segmentation  of  two  classes,  one  corresponding 
to  a  positive  autocorrelation,  the  other,  to  a  negative  autocorrelation. 

The  mechanism  generating  the  process  changes  from  time  to  time,  and 
these  changes  manifest  themselves  at  some  unknown  time  points  (epochs,  or 
change-points) 

t  J  ,  t£,  •  •  •  .  tjB-1  * 

i.e.,  the  number  of  segments  is  m.  The  integer  m  and  the  epochs  are  unknown. 
Generally  there  will  be  fewer  than  m  generating  mechanisms.  The  number  of 
mechanisms  (classes)  will  be  denoted  by  k;  it  will  be  assumed  that  k  is  at 
most  m.  In  some  situations,  k  is  specified;  in  others,  it  is  not.  With  the 
c-th  class  is  associated  a  stochastic  process,  Pc,  say.  E.g.,  above  we 
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•poka  of  a  situation  with  k“2  elatiai,  whore,  for  c_l,2,  the  procaas  Pe 
ia  first-order  autoragraaaive  with  coafficient,  say,  4>c . 

Now  with  tha  t-th  obaarvation  (t“l,2, . . . ,n)  aasoeiate  the  label 
y which  ia  equal  to  e  if  and  only  if  xt  arose  from  class  c, 
c  “  1,2,..., k.  Each  tiaa-point  t  gives  rise  to  a  pair 

(xt,Vt) , 

where  is  observable  and  "y^  is  not.  The  process  {xt}  is  the 
observed  tiae  series;  will  be  called  the  "label  process." 

Define  a  segmentation,  then,  as  a  partition  of  the  tiae  index  set 
{t:  t  ■  l,2,...,n}  into  subsets 

S|  m  {l ,2, • • • , tj) ,  S2  “  {tj+1, . . . , t2^ » • • • »  SB  “  {tB_i+l, . . . ,n} , 
where  the  t's  are  subscripted  in  ascending  order.  Each  subset  Sg,  g  ■ 

1,2,..., a,  is  a  segment.  The  integer  m  is  not  specified.  In  the  context  of 
this  aodel,  to  segment  the  series  is  merely  to  estimate  the  y'a. 

The  idea  underlying  the  development  here  is  that  of  transitions  between 
classes.  The  labels  y ^  will  be  treated  as  random  variables 
in  a  stochastic  process  with  transition  probabilities 

Pr(7tmd|7t-l"c)  “  pcd, 

taken  as  stationary,  i.e.,  independent  of  t.  The  k-by-k  matrix  of  transition 
probabilities  will  be  denoted  by  P,  i.e., 

I  "  tPcdJ- 

If  a  process  is  strictly  cyclic,  like  intake,  compression,  combustion, 
intake,  etc.,  for  a  combustion  engine,  or  recession  to  recovery  to  expansion 
to  recession,  etc.,  in  the  business  cycle,  then  this  condition  can  be  imposed 
by  using  a  transition  probability  matrix  with  zeros  in  the  appropriate  places. 
We  shall  consider  such  a  matrix  in  Section  2.3.2. 
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Segmentation  will  involve  the  simultaneous  estimation  of  several  sets  of 
parameters,  the  distributional  parameters  of  the  within-elass  stochastic 
processes,  the  transition  probabilities,  and  the  labels. 

A  joint  probability  density  function  (p.d.f.)  for  {(Xt,yc),  t“l,2, . . . ,nl 
is  obtained  by  successively  conditioning  each  variable  on  all  the  preceding 
ones.  The  working  assumptions  behind  the  method  of  this  paper  are  as  follows. 
(A.l)  The  labels  are  a  first-order  stationary  Markov  chain,  independent  of  the 
observations;  i.e.,  the  probability  of  being  in  state  d  at  time  t+1 
given  state  c  at  time  t,  is  pC(j»  which  does  not  involve 
t  or  the  values  of  the  observations. 

(A. 2)  The  distribution  of  the  random  variable  depends  only  on  its  own 
label  and  previous  X's,  not  previous  labels. 

Further  details  and  a  mathematical  formulation  corresponding  to  these 
assumptions  are  given  in  Selove  (1983a). 

In  regard  to  (A. 2),  in  the  simplest  case  the  X's  are  (conditionally) 
independent,  given  the  labels.  That  is,  the  disribution  of  Xt  depends 
only  on  its  label,  and  not  previous  X's.  In  the  examples  in  the  present 
paper  this  assumption  is  made.  In  this  case  the  p.d.f. 's  f(x|y^*c), 
c  “  l,2,.,,k,  are  called  class-conditional  densities.  In  the  parametric 
case  they  take  the  fora 

f(xt|>t-c)  “  g(xt;8c),  (1) 

where  8  is  a  parameter  indexing  a  family  of  p.d.f. 's  of  form  given  by  the 
function  g.  E.g.,  in  the  case  of  Gaussian  class-conditional  distributions, 

8C  consists  of  the  mean  and  variance  for  the  c-th  class. 

This  model,  with  transition  probabilities,  has  certain  advantages  over 
a  model  formulated  in  terms  of  the  epochs.  The  epochs  behave  as 
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constrained  discrete  parameters,  and,  even  if  the  suitably  transformed 
generalized  likelihood  ratio  were  asymptotically  chi-square,  the  number 
of  degrees  of  freedom  would  not  be  clear.  On  the  other  hand,  the 
transition  probabilities  vary  in  an  interval  and  it  is  clear  that  they 
constitute  a  set  of  k(k-l)  free  parameters. 

2.2.  An  algorithm 

2.2.1.  Development  of  the  algorithm 
It  follows  from  the  assumptions  that  the  likelihood  L,  i.e.,  the 
joint  p.d.f,  considered  as  a  function  of  the  parameters,  can  be  written 
in  the  form 

•  L  -  A({pcd},Wt})B({YtM3c}).  <2> 

Hence,  for  fixed  values  of  the  y* s  and  $'s,  L  is  maximized  with  respect  to 
the  p's  by  maximizing  the  factor  A.  Now  let  ncd  be  the  number  of  transitions 

from  class  c  to  d.  (These  n's  are  functions  of  the  labels.)  The  factor  A 

is  merely  the  point  multinomial  probability  function,  the  parameters 
being  the  n's  and  p's.  It  follows  that  the  maximum  likelihood  estimates 
of  the  p's,  for  fixed  values  of  the  other  parameters,  are  given  by 

Pcd  "  ncd/nc  *  (3) 

where 

nc  *  nc 1  +  nc2  +  •••  +  nck  • 

Further,  given  the  p’s  and  t's,  the  estimates  of  the  distributional 
parameters — the  8's — are  easy  to  obtain.  This  suggests  the  following 
algorithm. 

Step  0.  Set  the  8's  at  initial  trial  values,  suggested,  e.g.,  by 
previous  knowledge  of  the  phenomenon  under  study.  Set  the  p's  at 
initial  trial  values,  e.g.,  1/k.  Set  f(yj)  at  initial  trial 
values,  e.g.,  f(>p  ■  1/k,  for  “  1,2,. ...k. 

Step  1.  Estimate  by  maximizing  f (yp f (x^ |yp . 
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Step  2.  For  t“2,3,...,n,  estiaate  7^  by  maximizing 

f  (*t  I  ”^t  *  Xt~l  *  *  *  *  *  XP  * 

as,  under  (A.l)  and  (A. 2),  the  likelihood  can  be  expressed  as  a  product 
of  such  factors. 

Step  3.  Now,  having  labeled  the  observations,  estimate  the  distributional 
paraaeters,  and  estiaate  the  transition  probabilities  according  to  (3). 

Step  4.  If  no  observation  has  changed  labels  from  the  previous  iteration, 
stop.  Otherwise,  repeat  the  procedure  from  Step  1. 

This  algorithm  is  a  "relaxation  method,"  in  the  sense  of 
Southwell  (1940,  1946).  That  is,  one  maximizes  successively  on  each 
variable,  holding  the  others  fixed. 

Having  cast  the  problem  in  statistical  terms  and  obtained  a 
likelihood,  one  can  score  successive  iterations  according  to  the 
likelihood  values.  This  solves  the  problem  of  which  iteration's 
results  to  use,  when  there  is  not  always  improvement  from  iteration  to 
iteration.  (See,  e.g.,  Richards,  Landgrebe  and  Swain  (1980)  for  a 
discusssion  of  this  problem  with  relaxation  methods  for  pixel 
labeling.) 

To  maximize  by  considering  the  factors  mentioned  in  Step  2  is  to  use 
the  Viterbi  algorithm  (Forney  1973).  (The  Viterbi  algorithm  is  a 
recursive  optimal  solution  to  the  problem  of  estimating  the  state 
sequence  of  a  discrete-time  finite  state  Markov  process.) 

Step  2  is  Bayesian  classification  of  xt.  Suppose  the  (t-l)-st 
observation  was  tentatively  classified  into  class  c.  Then  the  prior 
probability  that  the  t-th  observation  belongs  to  class  d  is  pcd,  d  ■ 
l,2,...,k.  Hence  all  the  techniques  for  classification  in  particular  models 
are  available  (e.g.,  use  of  linear  discriminant  functions  when  the 
observations  are  multivariate  normal  with  common  covariance  matrix). 


2.2.2.  The  first  iteration 


When  the  k  class-conditional  processes  consist  of  independent,  identically 
distributed  normally  distributed  random  variables  with  common  variance,  one 
can  start  by  choosing  initial  means  and  labelling  the  observations  by  a  minimum- 
distance  clustering  procedure.  [This  is  one  iteration  of  ISODATA  (Ball  and  Hall, 
1967).  One  could  iterate  further  at  this  stage.]  From  this  clustering 
initial  estimates  of  transition  probabilities  and  the  variance  are  obtained. 

2.2.3.  Restrictions  on  the  transitions 
As  mentioned  above,  one  might  wish  to  place  restrictions  on  the 
transitions,  e.g.,  to  allow  transitions  only  to  adjace  states.  (E.g., 
"recovery”  is  adjacent  to  "recession",  "expansion"  is  'acent  to  "recovery," 
but  "expansion"  is  not  adjacent  to  "recession.")  The  m  ,t  ioes  permit 
restrictions  on  the  transitions.  The  maximization  is  conducted,  subject  to 
the  condition  that  the  corresponding  transition  probabilities  are  zero.  This 
is  easily  implemented  in  the  algorithm.  Once  one  sets  a  given  transition 
probability  at  zero,  the  algorithm  will  fit  no  such  transitions,  and  the 
corresponding  transition  probability  will  remain  set  at  zero  at  every 
iteration. 


2.3.  An  example 

Here,  in  the  context  of  a  specific  numerical  example,  the  problem 
of  fitting  the  model  for  a  fixed  k  and  the  problem  of  choice  of  k 
will  be  illustrated.  (The  additional  problem  of  prediction  is  considered 
in  Sclove  1983a.) 

Quarterly  gross  national  product  (GNP)  in  current  (non-constant)  dollars 
for  the  twenty  years  1947  to  1966  was  considered.  (This  makes  a  good  size 
dataset  for  expository  purposes  here.)  Parameters  were  estimated  from  the 
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first  19  years,  the  last  four  observations  (1966)  being  saved  to  test 
the  accuracy  of  predictions.  (See  Sclove  1983a.)  The  data  and  first 
difference  are  given  in  Table  1.  The  raw  series  is  nonstationary,  so  the 
first  differences  (increases  in  quarterly  GNP)  were  analyzed.  (Study 
of  more  recent  data  suggested  use  of  first  or  second  differences  of  logs; 
see  Sclove  (1983b) .)  The  notation  is 

xt  -  GNPt+1  -  GNPt  ,  t  -  1,2,..., 79; 

e.g.,  GNPj  is  the  GNP  at  the  end  of  the  quarter  1947-1,  GNP2  is  that  at  the 
end  of  1947-2,  and  x^  “  GNP £  ~  GNP}  is  the  increase  in  GNP  during  the  second 
quarter  of  1947.  (A  negative  value  of  an  x  indicates  a  decrease  in  GNP 
for  the  corresponding  quarter.)  A  Gaussian  model  was  used. 

2.3.1.  Fitting  the  model 

In  this  section  we  discuss  the  fitting  of  a  model  with  k*3  classes, 
discussion  of  the  choice  of  k  being  deferred  to  the  next  section. 

The  three  classes  may  be  considered  as  corresponding  to  recession, 
recovery,  and  expansion,  although  some  may  prefer  to  think  of  the  segments 
labeled  as  recovery  as  level  periods  corresponding  to  peaks  and  troughs. 

The  approximate  maximum  likelihood  solution  found  by  the  iterative  procedure 
was  (units  are  billions  of  current  (non-constant)  dollars)  -1.3,  6.2,  and  12.3 
for  the  means,  2.28  for  the  standard  deviation,  and 
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for  the  transition  probabilities.  The  estimated  labels  are  given  below; 
labels  (r  ■  recession,  e  “  expansion)  resulting  from  fitting  k**2  classes  (see 
Section  2.3.2)  are  also  given. 


-  V  V  --.  -\  v 

, . .  —  .  »  ,  .  Urn  r r  .f-  -'-i 


Sclove:  On  Segmentation  of  Digital  Imagas  9 

*********************************************************** ****************** 


1-I-S 

t: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

V£\* 

label. 

k-3: 

2 

2 

3 

2 

2 

2 

1 

1 

1 

1 

1 

3 

3 

3 

3 

3 

2 

2 

2 

2 

1 

2 

3 

label. 

k-2: 

r 

r 

e 

e 

e 

'  e 

r 

r 

r 

r 

r 

e 

e 

e 

e 

e 

e 

e 

e 

r 

r 

e 

e 

||| 

24 

25 

26  27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

2 

2 

1  1 

1 

1 

2 

2 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

1 

2 

3 

3 

3 

3 

1 

e 

r 

r  r 

r 

r 

r 

e 

e 

e 

e 

e 

r 

r 

r 

e 

e 

r 

e 

r 

r 

r 

e 

e 

e 

e 

r 

V  .Vi 

0 

••A* 

51 

52 

53  54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

2 

3 

2  1 

1 

1 

3 

3 

3 

3 

3 

2 

2 

2 

2 

3 

3 

3 

3 

3 

2 

3 

3 

3 

3 

e 

e 

r  r 

r 

r 

e 

e 

a 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

e 

The  procasa  was  in  stata  1  for  21%  of  tha  tine,  in  state  2  for  44%  of 
the  tine,  and  in  state  3  for  35%  of  the  time.  The  labeling  corresponds 
well  to  the  conventional  wisdom  of  economists  concerning  recession 
during  these  years;  see  Sclove  (1983a). 

An  interesting  feature  of  the  model  and  the  algorithm  is  that,  as  the 
iterations  proceed,  soma  isolated  labels  change  to  conform  to  their  neighbors. 

This  should  be  the  case  when  pec  is  large  relative  to  pcd,  for  d  different  from  c. 
2.3.2.  Choice  of  number  of  classes 

Various  values  of  k  were  tried,  the  results  being  scored  by  means  of 
Akaike's  information  criterion  (AIC) .  (See,  e.g.,  Akaike  1981.) 

To  estimate  k  one  uses  the  minimum  AIC  estimate,  where 
AIC(k)  *  -21ogeCmax  L(k)3  +  2m (k) . 

Here  L(k)  is  the  likelihood  when  k  classes  are  used,  max  denotes  its  maximum 
over  the  parameters,  and  m(k)  is  the  number  of  independent  parameters  when  k 
classes  are  used.  The  statistic  AlC(k)  is  a  natural  estimate  of  the 
"cross-entropy"  between  f  and  g(k),  where  f  is  the  (unknown)  true 
density  and  g(k)  is  the  density  corresponding  to  the  model  with  k 
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classes.  (See,  e.g.,  Parzen  (1982)  for  a  discussion  of  the 
cross-entropy,  H(f;g).)  According  to  AIC,  inclusion  of  an  additional 
paraaeter  is  appropriate  if  loge[aax  L]  increases  by  one  unit  or 
aore,  i.e.,  if  aax  L  increases  by  a  factor  of  e  or  more.  It  aay  be 
generally  preferable  to  consider  the  Schwarz  (1978)  criterion 
-21ogc[aax  L(k)]  +  (loge  n)a(k). 

It  favors  models  with  fewer  parameters.  However,  in  this  particular  example, 
as  will  be  seend,  even  AIC  favors  the  aodel  with  thn  minimum  number 
of  paraaeters. 

The  aodel  was  fit  with  several  values  of  k  and  unrestricted  transition 
probabilities.  Also,  since  it  seems  reasonable  to  restrict  the  transitions 
to  those  between  adjacent  states,  such  models  were  evaluated  as  well.  In  the 
case  of  k*3,  where  the  states  might  be  considered  as  recession,  recovery, 
and  expansion,  this  means  setting  equal  to  zero  the  transition  probabilities 
coresponding  to  the  transitions,  recession-to-expansion  and  expansion-to- 
recession.  The  results  are  given  in  Table  2.  The  best  segmentation  model, 
as  indicated  by  ainimum  AIC,  is  that  with  only  two  classes. 

A  aodel  with  only  two  classes  enjoys  advantages  relating  to  its  relative 
simplicity. 

The  results  for  k*2  classes  (which  might  be  called  recession,  expansion) 
were  0.43  and  10.09  for  the  aeans,  3.306  for  the  standard  deviation,  and 

.667  .333 

.170  .830 

for  the  transition  probabilities.  The  process  was  in  state  1  for  372 
of  the  tiae  and  state  2  the  other  632  of  the  time.  The  labels  were 


given  above. 
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The  algorithm  has  baan  usad  with  good  success  to  segment  bovine 
teaperature  tiaa  series  into  menstrual  and  non-aenstrual  periods  and 
to  segment  a  time  series  of  influenza  deaths  into  epidemic  and 
non-epidemic  periods. 


3.  A  Model  and  Method  for  Segmentation  of  Digital  Images 

A  digital  (i.e.,  numerical)  image  may  be  considered  as  a  rectangular 
array  of  picture  eleaents  (pixels).  These  will  be  indexed  by  (i,j).  At 
each  pixel  the  saae  p  features  are  observed.  We  denote  the  features  by 

*1'  x2*  •••»  xp* 

The  vector  of  features  is 

X  -  <Xlf  X2 . Xp). 

The  digital  iaage  is 

(Xi j i  i-l , 2, ....  I ,  j  —  1.2y...|j}  f 

where  * 

£ij  “  ^xli  j •  x2ij*  •••»  xpij) 

is  the  vector  of  numerical  values  of  the  p  features  at  pixel  (i,j). 
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pixels,  and,  also,  the  image  consists  of  several  segments.  Each  pixel 
belongs  to  one  and  only  one  segment.  The  segments  fall  into  several 
classes.  For  example,  in  a  picture  of  a  house  the  classes  might  be 
brick,  sky,  grass,  shadow  and  brush.  Note  that  there  might  be  several 
separate  areas  of,  say,  grass.  Each  of  these  areas  is  a  segment, 
but  they  all  belong  to  the  class  called  "grass." 

The  statistical  model  accompanying  this  conceptual  model  is  as  follows: 

--with  each  class  of  segment  is  associated  a  probability 
distribution  for  the  feature  vector  X; 

— with  each  pixel  is  associated  a  label  which,  were  it  known  to  us, 
would  tell  us  which  class  of  segment  the  pixel  belongs  to. 

Each  pixel  thus  gives  rise  to  a  pair  (X,y) ,  where  X  is  observable 

and  y  is  not.  In  the  context  of  this  statistical  model 

segmentation  is  estimation  of  the  set  of  labels. 

Often  one  considers  parametric  models,  in  which  the  class-conditional 
probability  functions  fc(x)  are  assumed  known,  except  for  the  values 
of  distributional  parameters.  That  is, 

fc(x)  -  f(x;0c), 

where  @c  is  the  parameter.  E.g.,  in  the  multivariate  Gaussian  case 
3C  consists  of  the  mean  and  covariance  matrix  for  class  c. 

In  order  to  model  the  spatial  correlation  of  images,  one  can  assume 
that  the  labels  y^:  form  a  stochastic  process,  say  a  Markov  process. 

One  reads  through  the  array  in  a  fixed  order,  say,  first  row,  left  to 
right,  second  row,  left  to  right,  and  conditions  a  given  pixel  on 
pixels  preceding  it  in  this  ordering.  Here  a  first-order  Markov 
process  would  be  one  where  a  given  pixel  is  conditioned  on  the  pixels 
to  the  north  and  west  of  it.  Thus  the  transition  probability  matrix 
has  the  following  form  for  k-3  classes  of  segment. 
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Pixel  to 
north  west 
1  1 

1  2 

1  3 

2  1 

2  2 

2  3 

3  1 

3  2 

3  3 


The  total  number  o£  parameters,  distributional  parameters  and 
transition  probabilities,  is  large.  But,  with  very  large  datasets  one 
need  not  necessarily  shy  away  from  models  with  many  parameters. 

The  algorithm  developed  £or  segmenting  images  according  to  this 
model  is  similar  to  that  £or  segmenting  time  series,  except  now  the 
transition-probability  matrix  is  more  complicated. 

As  a  sample  "image"  the  Fisher  iris  data  were  used.  This  dataset 

> 

consists  of  4  features  measured  on  150  flowers,  50  in  each  of  three  species. 
To  form  an  iswge  the  150  flowers  were  arranged  into  a  15  x  10  rectangular 
array,  rows  1-5  being  species  1,  rows  6-10  being  species  2,  rows  11-15  being 
species  3.  This  means  that  the  true  segmentation  is  as  follows. 


Center  pixel 


1 

2 

3 

Pll.l 

Pll,2 

Pll,3 

P12.1 

P12.2 

Pl2,3 

P13,1 

P13.2 

Pl3,3 

P21.1 

P21,2 

P21,3 

P22.1 

P22.2 

P22.3 

P23.1 

P23.2 

P23.3 

P31,i 

P31.2 

P31.3 

P32.1 

P32,2 

P32.3 

P33.1 

P33.2 

P33.3 

•a 

m 
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TRUE  SEGMENTATION: 


10  2 

11  3 

12  3 

13  3 

14  3 

15  3 


COLUMN: 

2  3  4  5 
1111 
1111 
1111 
1111 
1111 
2  2  2  2 
2  2  2  2 
2  2  2  2 
2  2  2  2 
2  2  2  2 

3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 
3  3  3  3 


6  7  8  9  10 
11111 
11111 
11111 
11111 
11111 
2  2  2  2  2 
2  2  2  2  2 
2  2  2  2  2 
2  2  2  2  2 
2  2  2  2  2 
3  3  3  3  3 
3  3  3  3  3 
3  3  3  3  3 
3  3  3  3  3 
3  3  3  3  3 


Below  are  given  results  obtained  by  starting  with  initial  means  equal 
to  the  measurements  on  flowers  50,  100  and  150.  (These  are  easy  for 
the  algorithm  in  the  sense  that  they  are  in  fact  from  the  three 
different  species,  but  not  so  easy  as,  e.g.,  flowers  1,51  and  101, 
which  are  further  apart.  Starting  with  means  that  are  from  correct 
classes  is  analogous  to  most  applications,  where  something  is  known 
in  advance  about  the  characteristics  of  different  classes  of  segment.) 
The  results  in  successive  iterations  were  as  follows.  Convergence  was 
reached  on  the  fourth  iteration,  i.s.,  on  that  iteration  no  pixel 
changed  class. 


flag* 


-  *  «•  - 


•  I 


Sclove:  On  Segaentation  of  Digital  Iaages  15 


SEGMENTATION  ON  ITERATION  1: 


ROW:  COLUMN: 

123456789  10 
11111111111 
2  1111111111 

3  1111111111 

4  1111111111 

5  1111111111 
63332323232 
72223222222 
83232223322 
92223233222 

10  2322222222 

11  3333332333 

12  3333333333 


CONFUSION  MATRIX: 


True  Class 


1 

2 

3 

1 

50 

0 

0 

50 

Label  2 

0 

36 

1 

37 

3 

0 

14 

49 

63 

50 

50 

50 

150 

13  3333333333 


14  3333333333 

15  3333333333 


SEGMENTATION  ON  ITERATION  2: 

ROW:  COLUMN:  CONFUSION  MATRIX: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

True  Class 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1  2 

3 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

— 

— 

3 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1  50  0 

0  50 

4 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Label  2  0  42 

1  43 

5 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

3  0  8 

49  57 

6 

2 

3 

3 

2 

3 

2 

3 

2 

3 

2 

- - 

7 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

50  50 

50  150 

8 

3 

2 

2 

2 

2 

2 

2 

3 

2 

2 

9 

2 

2 

2 

3 

2 

2 

2 

2 

2 

2 

10 

2 

2 

2 

2 

2 

2 

2 

2 

2 

a* 

11 

3 

3 

3 

3 

3 

3 

2 

3 

3 

3 

12 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

13 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

14 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

15 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 
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SEGMENTATION  ON  ITERATION  3: 


ROW:  COLUMN:  CONFUSION  MATRIX: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Tru« 

i  Class 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

- - 

- - 

3 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1  50 

0 

0  50 

4 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Label  2  0 

44 

0  44 

5 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

3  0 

6 

50  56 

6 

2 

3 

2 

3 

2 

3 

2 

3 

2 

3 

7 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

50 

50 

50  150 

8 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

9 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

10 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

11 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

12 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

13 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

14 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

15 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

Again,  a  good  faaturo  of  tha  algorithms  is  that  isolated  labels 
tend  to  disappear,  solving  problems  such  as  erroneously 
classifying  an  isolated  pixel  as  corn  in  what  is  obviously  a  wheat 
field. 

All  computations  reported  here  were  carried  out  using  FORTRAN  computer 
programs  written  by  the  author.  Tapes  of  these  programs  were  sent  to  the 
Statistics  Program  at  the  Office  of  Naval  Research  (ONR)  for  deposit  in 
the  Computer  Division  of  the  Naval  Research  Laboratories.  New  versions 
of  these  programs,  as  well  as  other  segmentation  programs,  are  being 
developed.  It  is  planned  to  publish  these  in  Chen  (1984). 

Acknowledgement .  This  research  was  supported  by  Office  of  Naval 
Research  Contract  N00014-80-C~0408,  Task  NR042-443,  and  Army  Research 
Office  Contract  DAAG29-82-K-0155 ,  at  the  University  of  Illinois  at 
Chicago. 
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TABLE  1.  GNP.  Units:  billions  of  currant  (non-constant)  dollars 
(from  Nelson  (1973),  pp.  100-101) 

Quarter 

1  2  3  4  1  2  3  4 

1947-48  GNP  224  228  232  242  248  256  263  264* 

change  4.0  4.2  10.3  5.9  7.6  6.9  1.4  -5.4 

1949-50  GNP  259  255  257  255  266  27  293  305~ 

change  -3.3  1.9  -2.1  11.0  9.4  17.7  11.4  13.5 

1951-52  GNP  318  326  333  337  340  339  346  358~ 

change  7.8  7.0  4.1  2.6  -0.4  6.5  12.1  6.5 

1953-54  GNP  364  368  366  361  361  360  365  373 

change  3.3  -1.7  -5.0  -0.1  -0.3  4.3  8.7  12.8 

1955-56  GNP  386  394  403  409  4H  416  421  430 

change  8.2  8.1  6.3  1.8  5.6  4.4  8.9  7. A 

1957-58  GNP  437  440  446  442  435  438  45l  464 

change  3.0  6.4  -4.8  -6.8  3.6  13.1  13.0  9.6 

1959-60  GNP  474  487  484  491  503  505  504  503 

change  12.9  -2.9  .  .  6.5  12.5  1.7  -0.5  -0..9  0.3 

1961-62  GNP  504  515  524  538  548  557  564  572" 

change  11.3  9.3  13-5  10.1  9.4  7.2  7.6  5.4 

1963-64  GNP  577  584  595  606  618  628  639  645 

change  6.8  10.5  11.1  11.9  10.3  10.9  6.2  17.7 

676  691  710  730  743  756  771 

15.4  18.9  19.5  13.8  12.6  14.8  13.5 


1965-66  GNP  663 

change  12.9 
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TABLE  2.  Fitting  models 

Model  AIC 

a 

Segmentation,  2  classes  481.4 

Segmentation,  3  classes,  full  trans.  prob.  matrix  483.6 

b 

Segmentation,  3  classes,  sparse  trans.  prob.  matrix  488.5 
Segmentation,  4  classes,  full  trans.  prob.  matrix  507.1 

b 

Segmentation,  4  classes,  sparse  trans.  prob.  matrix  486.8 


+ 


Segmentation,  5 
Segmentation,  5 
Segmentation,  6 


classes, 

classes, 

classes. 


full  trans.  prob.  matrix 
sparse  trans.  prob.  matrix 
full  trans.  prob.  matrix 


506.5 

c 

stopped 

c 

stopped 


a.  Optimum,  among  segmentation  models  considered. 

b.  Allows  transitions  only  to  adjacent  states. 

c.  Stopped,  i.e.,  the  algorithm  reached  an  iteration  where  it 


allocated 


no  observations  to  one  of  the  classes. 
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The  problem  of  partitioning  a  digital  image  into  segments  is 
considered.  First  the  procedure  is  illustrated  for  the  analogous 
one-dimensional  problem,  namely,  segmentation  of  time  series.  Then 
similar  ideas  are  applied  to  the  segmentation  of  digital  images. 

The  segments  are  considered  as  falling  into  classes.  A  probability 
distribution  is  associated  with  each  class  of  segment.  Parametric 
families  of  distributions  are  considered,  a  set  of  parameter  values 
being  associated  with  each  class.  With  each  observation  is  associated 
an  unobservable  label,  indicating  from  which  class  the  observation 
arose.  The  label  process  is  modeled  as  a  Markov  process.  Segmentation 
algorithms  are  obtained  by  applying  a  method  of  iterated  maximum 
likelihood  (a  relaxation  method)  to  the  resulting  likelihood  function. 

In  this  paper  special  attention  is  given  to  the  situation  in  which  the 
observations  are  conditionally  Independent,  given  the  labels.  Numerical 
examples  are  given.  Choice  of  the  number  of  classes,  using 
statistical  model-selection  criteria,  is  illustrated. 
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